Skip to main content

Database Design

I. Relational Database Design

Resources

Database design is the process of organizing and structuring data in a logical manner to efficiently store, manage, and retrieve information in a database system. It involves defining the database structure, relationships between data elements, and constraints to ensure data integrity and consistency.

Database design is important for several reasons:

  1. Efficiency: A well-designed database optimizes data storage and retrieval, improving system performance.
  2. Data integrity: Proper design helps maintain accurate and consistent data across the database.
  3. Scalability: Good design allows the database to grow and adapt to changing requirements over time.
  4. Reduced redundancy: It minimizes data duplication, saving storage space and reducing inconsistencies.
  5. Improved data security: Proper design facilitates the implementation of access controls and security measures.

1. Schema Design

A database schema is a formal description of the structure, organization, and relationships of data within a database. It serves as a blueprint for how data is stored and accessed. The schema defines:

  1. Tables and their attributes (columns)
  2. Relationships between tables
  3. Constraints and rules for data integrity
  4. Data types for each attribute
  5. Indexes and other performance optimization structures

There are three levels to schema design. These three levels form a progression from abstract to concrete, with each level adding more detail and specificity.

a. Conceptual

  • This is the highest level of abstraction.
  • It provides a big-picture view of what the database will contain.
  • It focuses on identifying main entities, their attributes, and relationships.
  • Usually represented using an Entity-Relationship Diagram (ERD).
  • Does not include technical details.

b. Logical

  • This is a more detailed representation of the database structure.
  • It translates the conceptual model into a specific data model (e.g., relational, object-oriented).
  • Defines tables, columns, relationships, and constraints.
  • Still independent of any specific database management system (DBMS).
Look up Tables

A lookup table is a data structure used to map, reference, or convert data efficiently. In database design and computer science, lookup tables serve various purposes:

  • speeding up data retrieval
  • standardizing information
  • simplifying data relationships
  • etc.

Example: A typical example is a table that maps status codes to their descriptions. For instance, in a customer order system:

  • 1Pending
  • 2Shipped
  • 3Delivered

This allows the system to store the numeric code in the order table, saving space and standardizing data while the lookup table provides the descriptive text.

CREATE TABLE status_codes (
status_id INT PRIMARY KEY,
description VARCHAR(50)
);

INSERT INTO status_codes (status_id, description)
VALUES
(1, 'Pending'),
(2, 'Shipped'),
(3, 'Delivered');
status_iddescription
1Pending
2Shipped
3Delivered
product_idproduct_nameproduct_status
1Latop1
2Sofa1
3TV3
4Smartphone2

By storing product_status as a numeric code and linking it to the status_codes lookup table, you achieve efficient data storage and maintain consistency.

When you need to update the description of a status, you only need to modify the status_codes table, and the products table will automatically reflect the updated description through joins.

c. Physical

  • This is the lowest level, describing how data is actually stored.
  • It includes specific details for implementation in a particular DBMS.
  • Defines storage structures, file organizations, indexes, access methods, and security measures.
  • Optimized for performance on the chosen DBMS platform.

2. Data Integrity

Data integrity refers to the overall accuracy, completeness, consistency, and reliability of data throughout its lifecycle in a database system. It's a crucial aspect of database design and management that ensures data remains valid, trustworthy, and usable over time.

There are three types of data integrity:

1. Entity Integrity

  • Ensures that each row in a table is a unique entity.
  • Primarily enforced through primary keys.
  • Prevents duplicate or null entries in columns designated as primary keys.
  • Example: In a student database, each student record must have a unique student ID.

2. Referential Integrity

  • Maintains consistent relationships between tables.
  • Enforced through foreign keys.
  • Ensures that values in a foreign key column correspond to existing values in the referenced table's primary key.
  • Prevents orphaned records (records with foreign key values that don't exist in the related table).
  • Example: In an order system, each order must be associated with a valid customer ID that exists in the customer table.

3. Domain Integrity

  • Ensures that all data in a column falls within a defined domain or set of acceptable values.
  • Enforced through various constraints like data types, check constraints, default values, and rules.
  • Maintains the accuracy and validity of data within each field.
  • Example: Ensuring that an 'age' column only accepts positive integers, or that a 'grade' column only accepts values A, B, C, D, or F.
CREATE TABLE students (
student_id INT PRIMARY KEY,
name VARCHAR(100),
age INT CHECK (age > 0), -- Ensures age is positive
grade CHAR(1) CHECK (grade IN ('A', 'B', 'C', 'D', 'F')) -- Ensures grade is within allowed values
);

II. Database Relationship

In a database, relationships define how tables are connected and how data is associated between them.

Parent - Child Tables

Parent and child tables are terms used in relational database design to describe the relationship between two tables connected by a foreign key.

  • Parent Table: A parent table, also known as a referenced table, is the table that contains the primary key which is referenced by a foreign key in another table.
  • Child Table: A child table, also known as a referencing table, is the table that contains the foreign key which references the primary key of the parent table.

The relationship between parent and child tables is typically a one-to-many relationship, where one record in the parent table can be associated with multiple records in the child table.

CREATE TABLE Authors (
author_id INT PRIMARY KEY,
author_name VARCHAR(100)
);

CREATE TABLE Books (
book_id INT PRIMARY KEY,
title VARCHAR(200),
author_id INT,
FOREIGN KEY (author_id) REFERENCES Authors(author_id)
);

Joiner Table

A joiner table (junction, bridge table) is a table in a relational database used to connect two or more tables, typically to resolve many-to-many relationships. Its primary key is usually a composite key composed of foreign keys referencing the primary keys of the tables it's connecting.

Example: We have the employees and clients tables. To express the many-to-many relationship where an employee can work with multiple clients and a client can be served by multiple employees, we create a joiner table called works_with.


+--------+-----------+-------------+
| emp_id | client_id | total_sales |
+--------+-----------+-------------+
| 107 | 400 | 55,000 |
| 101 | 400 | 267,000 |
| 101 | 404 | 22,500 |
+--------+-----------+-------------+

The joiner table works_with has a composite primary key consisting of emp_id and client_id. This combination uniquely identifies each employee-client relationship.

  • emp_id is a foreign key referencing the employees table.
  • client_id is likely a foreign key referencing the clients table.

There are three primary types of relationships:

1. One-to-One (1:1)

In a one-to-one relationship, each record in one table corresponds to exactly one record in another table, and vice versa.

  • Example: Table: users and userProfiles
    • Explanation: Each user has one unique profile, and each profile is associated with one user. The relationship between the tables users and userProfiles is one-to-one.

Designing One-to-One Relationships

Implementation: Usually, you'd have a foreign key in one table referencing the primary key in the other. This foreign key should have a unique constraint.

-- Example Schema
CREATE TABLE users (
user_id INT PRIMARY KEY,
username VARCHAR(50) UNIQUE,
email VARCHAR(100) UNIQUE,
password_hash VARCHAR(255)
);

CREATE TABLE userProfiles (
profile_id INT PRIMARY KEY,
user_id INT UNIQUE,
full_name VARCHAR(100),
bio TEXT,
date_of_birth DATE,
FOREIGN KEY (user_id) REFERENCES Users(user_id)
);

2. One-to-Many (1:M)

In a one-to-many relationship, a record in one table can have multiple associated records in another table, but each record in the second table is associated with only one record in the first table.

  • Example: Tables: users and cards
    • Explanation: A single user can have many cards, but each card belongs to only one user. The relationship between users and cards is one-to-many.

Designing One-to-Many Relationships

Implementation: Add a foreign key in the "many" table that references the primary key of the "one" table.

-- Example Schema
CREATE TABLE users (
user_id INT PRIMARY KEY,
user_name VARCHAR(100)
);

CREATE TABLE cards (
card_id INT PRIMARY KEY,
card_number VARCHAR(10),
user_id INT,
FOREIGN KEY (user_id) REFERENCES Users(user_id)
);

3. Many-to-Many (M:M)

In a many-to-many relationship, records in one table can be associated with multiple records in another table, and vice versa. This relationship is usually implemented using a joiner (or junction/bridge) table.

  • Example: Tables: students and courses
    • Explanation: A student can enroll in multiple courses, and a course can have many students. To manage this relationship, a joiner table, Enrollment, is used to link students and courses. The enrollment table typically contains foreign keys that reference the primary keys of the students and courses tables.

Designing Many-to-Many Relationships

Implementation: Create a joiner table with foreign keys to both main tables.

CREATE TABLE students (
student_id INT PRIMARY KEY,
name VARCHAR(100)
);

CREATE TABLE courses (
course_id INT PRIMARY KEY,
course_name VARCHAR(100)
);

CREATE TABLE enrollments (
student_id INT,
course_id INT,
enrollment_date DATE,
PRIMARY KEY (student_id, course_id), -- composite key
FOREIGN KEY (student_id) REFERENCES Students(student_id),
FOREIGN KEY (course_id) REFERENCES Courses(course_id)
);

4. Self-Referencing Relationship

In a self-referencing relationship, a table has a foreign key that references its own primary key.

This allows you to model hierarchical relationships, such as an employee reporting to a manager who is also an employee.

CREATE TABLE employees (
emp_id INT PRIMARY KEY,
name VARCHAR(100),
manager_id INT,
FOREIGN KEY (manager_id) REFERENCES Employees(emp_id)
);

III. Database Normalization

Database normalization is the process of organizing the structure of a relational database to minimize redundancy and improve data integrity. The goal is to design a database that is both efficient and easy to maintain, by dividing large tables into smaller, related tables and defining relationships between them.

Normalization is typically achieved by following a series of steps called normal forms. Each normal form addresses specific types of redundancy and dependency.

1. First Normal Form (1NF)

Ensures that the table is organized into rows and columns where each column contains atomic (indivisible) values, and each column contains only one type of data.

Example: A table with a column that stores multiple phone numbers in a single field should be split so that each phone number is stored in a separate row or column.

Original table:

CustomerIDNamePhone Numbers
1Alice123-456-7890, 987-654-3210
→ a 1NF compliant table:
CustomerIDNamePhone Numbers
1Alice123-456-7890
1Alice987-654-3210

2. Second Normal Form (2NF)

Builds on 1NF by ensuring that all non-key attributes are fully functionally dependent on the primary key. This means that there should be no partial dependency of any column on the primary key.

A partial dependency occurs when a non-key attribute depends on only part of a composite primary key.

Example: In a table with a composite primary key (e.g., a table tracking student enrollments with student_id and course_id as keys), all other columns must depend on the full primary key, not just a part of it.

Original table:

Student_IDCourse_IDCourse_NameInstructor
1C1MathSmith
1C2PhysicsJohnson
2C1MathSmith
→ a 2NF compliant table: split into 2 tables Enrollments and Courses
Student_IDCourse_ID
1C1
1C2
2C1
Course_IDCourse_NameInstructor
C1MathSmith
C2PhysicsJohnson
C1MathSmith

3. Third Normal Form (3NF)

Further refines 2NF by ensuring that all non-key attributes are not only dependent on the primary key but are also non-transitively dependent. This means no non-key attribute should depend on another non-key attribute.

A transitive dependency occurs when a non-key attribute depends on another non-key attribute, rather than depending directly on the primary key.

Example: Consider a table OrderDetails with the following attributes:

OrderIDProductIDProductNameCategoryCategoryManager
1001P101LaptopElectronicsJohn Smith
1002P102SmartphoneElectronicsJohn Smith
1003P201Desk ChairFurnitureMary Johnson
1004P101LaptopElectronicsJohn Smith

In this table:

  • OrderID and ProductID form the composite primary key.
  • ProductName depends on ProductID.
  • Category depends on ProductID.
  • CategoryManager depends on Category, not directly on the primary key.

This violates 3NF because CategoryManager is transitively dependent on the primary key through Category.

To comply with 3NF, we should split this into two tables: OrderDetails and Categories.

OrderIDProductIDProductNameCategory
1001P101LaptopElectronics
1002P102SmartphoneElectronics
1003P201Desk ChairFurniture
1004P101LaptopElectronics
CategoryCategoryManager
ElectronicsJohn Smith
FurnitureMary Johnson

Now:

  1. OrderDetails contains only attributes that are fully dependent on the primary key.
  2. Categories separates the CategoryManager, which depends on Category, into its own table.